Electronic Journal of Statistics 

Vol. 2 (2008) 993-1020 

ISSN: 1935-7524 

DOI: 10.1214/07-EJS127 



Adaptive estimation of linear 
functionals by model selection 

Beatrice Laurent 

Institut de Mathematiques (UMR 5219), INSA de Toulouse, Universite de Toulouse, France 
e-mail: beatrlce . laurentOlnsa-toulouse . f r 

Carenne Ludena 

IVIC, Venezuela 
e-mail: cludenaSeuclides . ivic . ve 

Clementine Prieur 

Institut de Mathematiques (UMR 5219), INSA de Toulouse, Universite de Toulouse, France 
e-mail: Clementine . prieur Oinsa-toulouse . f r 

Abstract: Wc propose an estimation procedure for linear functionals based 
on Gaussian model selection techniques. We show that the procedure is 
adaptive, and we give a non asymptotic oracle inequality for the risk of the 
selected estimator with respect to the Lp loss. An application to the prob- 
lem of estimating a signal or its r*'' derivative at a given point is developed 
and minimax rates are proved to hold uniformly over Besov balls. We also 
apply our non asymptotic oracle inequality to the estimation of the mean 
of the signal on an interval with length depending on the noise level. Simu- 
lations are included to illustrate the performances of the procedure for the 
estimation of a function at a given point. Our method provides a pointwise 
adaptive estimator. 

AMS 2000 subject classifications: 62G05, 62G08. 

Keywords and phrases: Nonparametric regression, white noise model, 
adaptive estimation, linear functionals, model selection, pointwise adaptive 
estimation, oracle inequalities. 

Received October 2007. 



1. Introduction 

Wc consider the following model: 

Y (t) = (s, t) + -^L (t) , for all t G H, (1.1) 

where HI is a separable Hilbert space endowed with the scalar product (., .) and L 
is some centered Gaussian isonormal process, which means that L maps isomet- 
rically H onto some Gaussian subspace of L2 (fl), where (fi, G, P) is some canon- 
ical probability space. This framework includes the finite dimensional Gaussian 
regression model, the Gaussian sequence model and the multivariate white noise 
model. 
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Let T be a linear functional defined over 5 C H. In this paper we consider 
the problem of estimating T(s), based on the observation of iY{t), t e H). Our 
main goal will be to develop procedures which adapt to the smoothness of the 
underlying function s in the framework of model selection as proposed by Barron 
et al. [3]. 

Minimax theory for estimating linear functionals is well developed in the 
Gaussian setting. Ibragimov and Hasminskii [Id] obtained the best minimax 
linear estimator over classes of smooth functions. For any convex parameter 
space J- , the minimax mean squared error is of the order of the modulus of 
continuity of the functional over T (see Donoho and Liu [13], Donoho [12] and 
Cai and Low [f^, 9], the latter being a generalization to certain non convex 
parameter spaces). These authors have also constructed procedures which have 
maximum risk close, up to a small factor, to the minimax rates. However, these 
rates cannot be attained when dealing with adaptive estimation over several 
classes of parameters. For the problem of estimating a function at a given point, 
Lepski [19] showed that it is necessary to include a logarithmic factor in the 
mean squared error when dealing simultaneously with two Lipschitz classes. For 
general parameter spaces, Cai and Low [11, 10] show that it is necessary to 
include a between class modulus of continuity to quantify precisely the degree 
of adaptability for the estimation of a linear functional with respect to the mean 
squared error. They also proposed an adaptive estimator based on multiple tests 
over an ordered sequence of parameter spaces. Their methodology thus resembles 
"Lepski's method" (see for example [19, 20, 21]) in the sense that the estimation 
procedure chooses the best possible over a finite selection of parameter spaces. 
This point of view is also developed in Klemela and Tsybakov [17], where the 
authors construct an asymptotically sharp adaptive estimator of T(s) based on 
kernel methods. They assume that the signal s belongs to a class of regular 
functions, the index of regularity being bounded from above and below by 
known constants. Lepski and Spokoiny [23], Lepski, Mammen and Spokoiny [22] 
propose methods based on kernel estimates with variable bandwidth selector for 
pointwise adaptive estimation in the Gaussian white noise model. 

Model selection methods for adaptive estimation have been initiated in a se- 
ries of papers by Birge and Massart (see for example Birge and Massart [5, G], 
Barron, Birge and Massart [3]). These methods have been used in the framework 
of the regression with fixed or random design, to estimate the regression func- 
tion by Baraud [1] and [2]. In this article, following Birge [4] we take a model 
selection point of view at adaptive estimation via Lepski's method. In order to 
construct the adaptive estimator of the linear functional we shall choose among 
an ordered family of finite dimensional linear subspaces of H. Over each sub- 
space we consider an estimator based on projection methods and the problem 
is thus establishing a best possible procedure for determining the subspace. The 
main issue here is that, unlike the case of penalized least squares, the bias of the 
estimator is not a monotonically decreasing sequence over the family of nested 
subspaces. Hence it is necessary to modify the procedure as developed by Birge 
[4] in order to obtain an appropriate estimator of the bias. The main advantage 
of our formulation is that it allows to obtain non asymptotic oracle inequalities 
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for general linear functionals. In the framework of the Gaussian sequence model 

Y, = e, + ai,,i(^n 

where the ^^s are i.i.d. standard Gaussian variables, Golubev and Lcvit [14] 
obtain an oracle inequality for the estimation of a general linear functional 
of 6* = (6'i)igN- They assume that the 0[s arc independent centered Gaussian 
variables, this is not the framework that we consider in the present paper. 

We propose a general method to estimate a linear functional of s in the 
framework of Model (1.1). We apply this general procedure to estimate the 
value of the r*^ derivative of a function at a point. For this problem, we provide 
minimax rates uniformly over Besov balls that correspond to the rates estab- 
lished by Lcpski [!'.)]. We also give an application of our procedure to pointwise 
adaptive estimation in a multidimensional framework. Moreover, since wc have 
obtained a non asymptotic oracle inequality, we are able to apply our result to 
the estimation of linear functionals that depend on the noise level (or on the 
number of observations). In the white noise model, wc consider the estimation 
of the mean of the signal on an interval with length depending on the noise 
level. The interesting fact in this case is that we obtain two kinds of rates of 
convergence, according to the relationship between the length of the interval, 
the noise level, and the regularity of the signal. When the length of the interval 
is too small, this problem is as hard as estimating the signal at some fixed point 
and when the length of the interval is large, the functional can be estimated 
at the parametric rate 1 / ^Jn. All intermediate rates are obtained as the length 
of the interval grows. Wc present simulation results to estimate a function at 
a point, and we compare our method to a global (not pointwise) model selec- 
tion estimator and to an estimator based on wavelet shrinkage. Our method 
provides a locally adaptive estimator of a regression function s on [0, 1]. It has 
good properties when estimating functions that are very oscillating over some 
regions and nearly flat over other ones. 

The article is organized as follows. In Section 2 we present the framework, 
the estimation procedure and our main result. In Section 3 we develop three ex- 
amples: estimating the value of the r*'' derivative of a function at a point using 
a multiresolution analysis, estimating the mean of the signal on an interval with 
length depending on the noise level and estimating the value of a multidimen- 
sional function at a point. In Section 4 we present the simulation study. Proofs 
of our main results are given in Section 5. 

2. Main results 
2.1. The framework 

Given some separable Hilbert space H, endowed with the scalar product (., .), 
one observes iY{t),t S H) as defined by Model (1.1). Since L is some cen- 
tered Gaussian isonormal process defined on H, we have that for all t, u G H, 
Cov(L(t),L(u)) = {t,u). 
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Let US first consider three particular cases of Model (l.f). 

The finite dimensional Gaussian regression. 

One observes 

Yi = s, + £i,i = I . . . ,n 

wliere (ei, . . . ,e„) are independent standard normal variables. 

We consider IH = E" endowed with the scalar product {x, y) = ^ ^iVi 
and set s = (si, . . . , s„). 

Model (1.1) is obtained by setting, for all t = (ti,...,t„) e E", Y{t) = 
iELi*.^. andL(i) = ^Er=ite. 

The Gaussian sequence model. 

In the Gaussian sequence model, one observes 

n =/?A + ^eA,AeN*, (2.2) 

where (eA)AeN* is a sequence of independent standard normal variables. 

Setting HI = Z2(N*) endowed with the usual scalar product 7) = X^agn* /^a7a 
and s = {Px)\(zn., we define for any t = (aA)AGN* € H, Y{t) = X^agn* c^^^^a 
and L{t) ~ X^asn* "^aEa and we see that (2.2) is a particular case of Model (1.1). 

The multivariate v^fhite noise model. 

One observes 

Z{x)= [ l[o.xi]x--x[o,xa]{u)s{u)du+ -^W{x) 

for all X — {xi, . . . ,Xd) £ [0, 1]'', where W is the standard Wiener Process on 
[0, 1]''. We consider H = IL2([0, 1]'') endowed with its usual scalar product. 

We set Y{t) = /j^^^j, t(u)dZ(u) and L{t) = J^^^^^,t{u)dW{u). 

Our purpose is to propose new adaptive estimators of T(s). where T is a 
linear functional, from observation (1.1). 

2.2. The estimation procedure 

We consider a finite or countable collection {Sm, m E A4) of linear subspaces of 
H. For all m G A^, we can define the estimator s„i of s, which is the projection 
estimator of s onto Sm- Given some orthonormal basis (c^a, A S Am) of Sm, it 
is natural to consider the projection estimator 

Sm = ^ Y{(l)x)(l)x. 

AeA„ 

It is easy to verify that 



Sm = argmin^gs^ - 2Y{v)) , 
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which shows that Sm docs not depend on the particular choice of the basis 

It is natural to estimate T{s) by T(sm)- Let Sm denote the orthogonal pro- 
jection of s onto Sm- Since T is a linear functional, 

E{T{sm))=T{s„,). 

Hence, the quadratic risk of the estimator T{srn) can be decomposed into a 
variance term and a bias term: 



E 



{T{sm)~T{s)f ^{T{sm)-T{s)y +E (T(s™) - T(s™)) 



2 



^2 



The variance term can be easily computed, by using the properties of the 
isonormal process L. 



E 



71 ' * 



2 

2 



n 

AeA„ 



Our aim is to find an estimator among the collection (T{sm)Tm e M) that 
minimizes the quadratic risk 

{T{sm)-T{s)f + <7l. 

Model selection by penalized criterion has been introduced by Barron et al. 
[3] and used in the framework defined by model (1.1) for the estimation of the 
whole object s by Birge and Massart [5], and by Laurent and Massart [L^] for 
the estimation of quadratic functionals of s. Usually, the bias term, or the sum 
of this bias term and a term that does not depend on m e . is estimated, and 
the methods proposed in previous papers consist in minimizing over m €z A4 
this estimation of the bias term, plus some penalty term pen(7Ti), which has to 
be suitably chosen. For example, when one estimates s, the bias term appearing 
in the quadratic risk E(|js — Sm|p) equals ||s — s,„|p. Using Pythagoras 'equality, 
this bias term equals ||s|p — ||s,„|p. Hence, minimizing ||s — Sm|l^ + pen(m) is 
equivalent to minimize — ||s,n|P + pen(m), and one can easily find an imbiased 
estimator of — ||sm|P- 

In our case, the bias term equals {T{sm) — T{s))'^ , and this expression can- 
not be simplified as previously nor estimated. Therefore, in order to use model 
selection by penalized criterion methods, we introduce a new criterion. We as- 
sume that is a subset of N. This implies in particular that is ordered. 
The criterion which is introduced in Definition 1 aims at finding m G A4 which 
minimizes 

sup |r(s,„) - T{sj) \ + am- 

j>m,j£M 

Definition 1 Let {Sm,'>Ti £ M) be a finite or countable collection of linear 
subspaces o/H. For all m G Ai, let {(f>\, A € A^) be an orthonormal basis of Sm 
and let 

Sm = ^ y((/)A)(/>A- 
AeA„ 
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We define, for all me M, 



pen{m) = ^flxmOm , 

where {xmi'm- S -M) 'is a sequence of nonnegative real numbers. 
We set, for all j,m £ M, 



-E 



AeA„ 



AeAi 



and 



H{j, m) = yj2xj^m(Tj^r. 



where (xj_,„, {j,m) g A4^) is a sequence of nonnegative real numbers. 
We define for all m £ M 

Crit{m) = sup [\T{s„i) - T{sj)\ - H{j,m)] + pen{m), 

and 



(2.3) 



m = inf < m £ A4, Critim) < inf Crit(j) H 

We estimate the linear functional T{s) byT{srh). 

In the following Theorem, we give an upper bound for the risk with respect to 
the Lp loss of the estimator T(sm). 

Theorem 1 Let EI be some separable Hilbert space endowed with the scalar 
product (.,.). One observes the Gaussian process {y(t),< £ H}, where Y{t) is 
given by (1.1)- Let T be some linear functional defined on iS C H. Let C N 
and let {Sm,rn € M) be some finite or countable collection of linear subspaces 
o/H. Let T{sm) be defined in Definition 1. Let for all m € M., 

Crit{m) = sup \T{sj) — r(s,„)| + pen[m). 

j>m,j£M 

Let m* be defined by 



m* ^'mf {me M / Critim) < inf CritU) + - 

leM n 



Then, for all p > 1, there exists some positive constant C{p) depending on p 
only such that 

E{\T{§^) T{s)n < C{p) {{Crtt{m*)r + \TM - Tis)]" + 



C{p) ( sup {W'{m\j)) + E e-^-cT^^ + ^ 



^-''-'<^n^+^ ■ (2-4) 
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Comments: 

• In the definition of Crit(m) given in (2.3), we compare T{sm) with the 
estimators T{sj) for j > m. This is a common point of our procedure with 
the initial method from Lepski [19, 20, 21, 23]. An important difference 
between our procedure and that of Lepski, however, is that we dissociate 
in (2.3) the terms H{j,m) and pen(m). This allows us to obtain non 
asymptotic results based on Gaussian concentration inequalities. 

Let us explain the main ideas underlying the definition of our estimator 
and how we obtain non asymptotic oracle inequalities for general linear 
functionals. As mentioned above, our goal is to minimize the criterion 

Crit(m) = sup \T{sj) — T{sm)\ + pcn(TO). 

j>m,j£M 

The first term in this expression is a bias term, and the second one is closely 
related to the standard deviation of T(sm). The unknown criterion Crit(m) 
is estimated by Crit(m) defined by (2.3). This criterion involves the term 
H{j,m) which is the standard deviation of |T(s„) — T{sj)\ multiphed by 
■^Ixj^yn- For a suitable choice of Xj^m, we prove that, with high probability, 

sup [|T(s„)-T(s,)|-i/(j,m)] < sup \T{sj)~T{s^)\. 

This implies that, with high probability, 

yme M, Crrt(m) < Crit(m). 
Since m is a minimizer of Grit (to), we obtain that, with high probability, 
Crit(m) < inf Grit(TO), 

which leads to an oracle inequality. We show that, up to remainder terms 
of smaller order, the risk of the estimator T{srh) with respect to the hp 
loss behaves as well as the risk of the "best" estimator of the collection. 
Our procedure is also easily implementable as we will see in the simulation 
study. 

• In order to prove Theorem 1, we do not have to assume that the family 
(5m, 771 e A1) is nested. We just need to order this family. If for example 
H = L2([0, 1]), we can mix spaces generated by several kinds of orthonor- 
mal bases, for example, a wavelet basis, the Fourier basis, a spline basis. 
Let us explain how to extend our procedure in the case where we consider 
L different bases. We set 

M = {{l,m)Je C,m e Ml}, 

where £ = {1, 2, . . . , L} and Al; C N. For all 777 e Mi, and I e {1,2, ... , L}, 
we set 

Grit;(TO) = sup [|r(s„) - T{sj)\ - H{j, m)] + pen(TO). 
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We define 

(f, m) = inf I [l, m) E M, CrTtffm) < inf CrTtfcfj) + - 
[ {k.j)eM n 

where M is ordered by the lexicographical order. 
• It would be simpler to define m as a minimizer of Crit(m), but such a 
minimizer may not exist, or not be unique. This explains why we add the 
term 1/n in the definition of m given in Definition 1. 

We shall derive in the next section applications of Theorem f to adaptive 
results in the minimax sense for pointwise estimation and for the estimation of 
the mean of the signal on some interval. 



3. Minimax results 

3.1. Pointwise adaptive estimation 

Assume we observe {Y{u),u G [0,1]) which obeys the Gaussian white noise 
model: 

nu 

Y{u):=^ s{x)dx + ^W{u),uE[Q,li (3.5) 
Jo V" 

where s £ H = L2 ([0, 1]) and is a standard Brownian motion. 

Let r > 0, we assume that s'^''^ exists and that s^*"^ S C([0, 1]), the set of 
continuous functions on [0,1]. We consider the problem of estimating T{s) = 
s^'^\xq) for some fixed xo G [0, 1]. 

We introduce the following notation: let {Sj , j > 0} be a multiresolution 
analysis with father wavelet Lp and mother wavelet "0 (see for example [15]). 
Define 

^j,k{x) = V/'^ip{2^x - A:) , X e [0, 1] , j > and fc e Z ; 
ijj,k{x) = 2J/2^(2-?a; - k) , x€ [0, 1] , j > and fc e Z . 

For all m > 0, Sm denotes the linear space spanned by the functions {(fm.k, k € 
Z). We recall that Sm denotes the orthogonal projection of s onto Sm- 



Sm = y^(3, (pm,k)(Pm,k, 



kez 

and that Sm denotes the estimator of s based on the model S„ 



/ ipm.kiu)dY{u)(pm,k- 



We will consider compactly supported wavelets if for which the above sum is 
finite. 
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For all a > and 1 < g < +oo, the notation ([0, 1]) is used for the classical 
Besov space endowed with the norm || • ||Q,cx),g (see for example [15, Defini- 
tion 9.2]). We denote by Ba,oo,q[L) the set of fimctions s in -BQ,oo,g([0, 1]) such 
that ||s||Q,oo,g < L. 

Assume that the following conditions are satisfied by </? and ^ : 

(i) 3M > such that supp((/3) and supp(-0) are included in [—M,M]. 

(ii) 3K>0 such that ||v3||oo V HV'lloo < K. 

(iii) 3N >0 such that / x"-ip{x)dx = for n = 0, . . . , TV. 

(iv) We assume that ip^"^^ exists and is bounded on supp((/3) by K^- 

Corollary 1 Let dn denote the integer part o/ln(n)/ln(2) and let A4 = {I, . . . , 
dn}- For all m G A4, let Sm he the linear space spanned by the functions 
{'^m,k,k e Z). 

For p >\, we define 



\fm E A4 , x,„ = 



2rn(l+2r) 



V(j, m) e M"" if J > m, xj.^m = | In {2'^^+^^^ - 2™(i+2-)) 



= 0. 



Let rh be defined as in Definition 1. Let r < a. 

There exist some constants C depending on a, p, q and r and C'{(7) depend- 
ing on a such that, for any integer n satisfying ?iL^/((T^ ln(n)) > 2^^"^" and 
n2"cr2ln(n) > 



sup E 

SGSa,oo,,(i) 



< 



p(l + 2r2 

CL 1+2° 



C{a) 



Inrt 

„p/2 



Inn 



Comments on the optimality of the result stated in Corollary 1 are given in 
Subsection 3.3, where the multidimensional case is considered. 



3.2. Estimation of the mean of the signal on an interval 

As above, we observe {Y{u),u £ [0, 1]) defined by (3.5). We use the same no- 
tation as in Section 3.1. We now consider the problem of estimating the linear 
functional 

T{s) = 7j- / s{x)dx 

where I^,^ is an interval included in [0, 1] with length i7„ that may depend on n. 

Corollary 2 Let (Y{u),u G [0,1]) defined by (3.5). Let Ih„ be some inter- 
val included in [0,1] with length i?„. Let T[s) = /^^ s{x)dx/ Hn- Let m„ = 
sup{m G N,2™ < l/i?„}. LetM = {0, 1,.. .m„ + 1}. For allm G M\{mn + 1}, 
let Sm be the linear space spanned by the functions {ipm,k, fc G Z) and let Sm„+i 
be the linear space spanned by the indicator function of the interval Ih„ ■ 
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For p > 1, we define 

x,n = — ym e M,m < nin, a;,„„+i = - ln(l/H„) 

^3,ni = ^ V{j,m) e M'^J ^ m and Xm,ni = 

Let m be defined as in Definition 1. Let n he any integer satisfying 
nL'^/a'^ ln(n) > 1. 

Then, for all a > Q,q > l,p > \, there exist some constants C depending 
on a, p and q and C{(j) depending on a such that the following inequalities hold: 

//i/„<(^^ln(n)/ni2)i/(i+2"), 



sup E(|r(sA)-T(s)|P)<CLT^ 



„p/2 



IfHn > {aHiiin)/nL') 



2xl/(l+2a) 



sup E i\T{s^) - Tis)n < C ^\ (ln(l/H„)f + 
Comments: 

• The rates of convergence that we obtain depend on the relation between 
i/„, n, CT and the regularity of the signal (via the parameters a and L). If 

Hn > (cr^ ln(n)/nL^)^^'^^^"'', the best estimator is the "naif" estimator 
Jj-^ dY{u)/Hm which is unbiased and achieves the rate Note 
that in our result, we loose a logarithmic term (ln(l/7?„)) for the adapta- 
tion to the unknown regularity of the signal. When iJ„ is independent of 
n, we recover the parametric rate l/-\Ai for the estimation of T(s). 
If Hn < (cr^ ln(n)/nL^)^^^^^^"'', we obtain the same rates for the estima- 
tion of T{s) as for the estimation of the signal s at one point. In this case, 
the "naif" estimator has a too large variance, and one takes advantage of 
considering estimators which are biased, but with smaller variance. 

• Our procedure is adaptive with respect to the unknown link between Hn 
and the regularity of the signal and allows us to obtain the optimal rates 
(up to logarithmic terms due to adaptation) in both cases as explained 
below. 

• It was possible to establish the upper bounds given in Corollary 2 since the 
result stated in Theorem 1 is non asymptotic. We have indeed considered 
here a linear functional that depends on n. 

Lower bounds: 

Using the results of Donoho and Liu [13], one can show that, up to logarithmic 
terms, the upper bounds given in Corollary 2 are optimal over Holderian balls. 
The lower bounds are given in the following lemma. 
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Lemma 1 Let {Y{u),u e [0, 1]) defined by (3.5). Let < < 1/2 and Ih,, = 
[Q,Hn], we consider the linear functional T[s) = Jj^ s{x)dx/H„ . Set, for all 

a e]0, 1] and L > 0, 

H„(L) = {/: [0,1]^R, Vx,ye [0,1], - < i|x - yj"} . 

//i?„>((l + 2a)/(2nL2)) 1/(1+2"), 

inf sup E((r„-r(5))2)>^%^ (3.6) 

where C(a, a) is a constant depending on a and a and where the infimum is 
taken over all possible estimators. 

IfHn < ((1 + 2a)/(nL2)) 1/(1+2") /2, and uL"^ > 1 + 2a, 

inf sup E((T„-T(s))2) > C(a,a)L2/(i+2")n-2"/(i+2"). (3.7) 

3.3. Multidimensional pointwise adaptive estimation 

One observes the Gaussian white noise model 

Z{x)^ [ l[o,xi]x-x[o.xi]{u)siu)du+ -^W{x) 
J[0,1]'' V" 

for all X — {xi, . . . ,Xd) G [0, 1]'', where W is the standard Wiener Process on 
[0, l]'^. 

We consider H = L2([0, 1]'') endowed with its usual scalar product. We as- 
sume that s € C([0, l]''); the set of continuous functions on [0,1]''. Let xq G 
[0, 1]'', we estimate T{s) = s{xo). 

Our aim is to obtain adaptive results in the minimax sense over isotropic 
Holder spaces defined as follows. For all a €]0, 1] and L > 0, let 

Hc.{L)^{f -.[0,1]'^ Ryx, ye [0,1]', \s{x)-s{y)\<L\\x-y\\'^}, 

where ||a; — j/||oo ~ ^^Pi<i<n ^Uil- order to estimate T(s), we use the Haar 
basis of L2( [0,1]''). 

For all m S M, let Sm be the space of piecewise constant functions on the 
sets [^,^[x ••• X [|^,^[ for all (fci,...,fcrf) e {0,1,..., 2" - 1}''. 

Corollary 3 Let dn denote the integer part 0/ ln(n)/ ((iln(2)) and let A4 = 
{1, . . .,dn}. 

For p > 1, we define 

V?7i € M, x„^ = y ln(2") 
V(j,m)GX2^x,-,„ = ^ln(2^"). 
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Let rh he defined in Theorem 1. For all a g]0, 1] and L > 0, the following 
inequality holds if n / {a^ \n{n)) > 2'^+'^°' L"'^ and < \n{n) : 



^ pa 

sup E (|s™(xo) - s{xo)n < CL^ f^_^\ ^ 

s<^Ha(L) \ n J 



pd ( a^\T^n\ ^°+'' „, , Inn 



where C is a constant depending on p, a and d and C (a) is a constant depending 
on a. 

Comments. It follows from the results given in Lepski [!!)] and Brown and Low 
[7] that the rates obtained in Corollary 1 and in Corollary 3 are optimal. These 
authors showed that the logarithmic loss which appears in the rate of conver- 
gence compared with the minimax rates is unavoidable for adaptive estimators. 

The dependence of our upper bound for the risk with respect to the radius 
L of the Besov or Holderian balls is the sharp one obtained by Klemela and 
Tsybakov [17]. 

4. Simulation study 

Throughout this section, we consider the finite dimensional Gaussian regression 
model. The regression functions that we consider are defined on [0, 1] by: 

si{x) — (a;^ — x) sin(6a;), 

S2{x) = exp(-30|a; - 0.75|) + exp(-30|x - 0.25|), 
53(2;) = a;cos(27r2;) Io<a;<2/3 + cos(157rx) l2/3<x<i- 
The estimation is based on the simulations 

Vt^ Sj(^^^ +ae^ i^l,...,n .7 = 1,2,3 (4.8) 

with (ei, . . . , e„) independent standard normal variables, a = 0.2 and n = 256. 

We set dn = ln2(n) = 8 and M = {l,2,...,c?„}. The estimators are built 
using a wavelet basis. We use the Haar basis (denoted by H in the tables) or 
the Daubechies 20 basis (denoted by D 20). In both cases, for all m G A^, 
Sm is the linear space spanned by the fimctions {ipm,k,k € Z), where (p„i,fc = 
2™/2(^(2™. — k), (f is the father wavelet of the basis. The results presented in 
the tables must be divided by 100. All simulations were programmed in Matlab 
7.3 with the wavelab wavelet toolbox. 



4.1- Pointwise estimation 



We first consider the estimation of the linear functional T{s) = s{xi) for some 
fixed points xi S [0, 1]. 
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When the Haar basis is used to construct the estimators, we obtain 



Vm e TW, (T^ = 2™ — , V 1 < m < ] < rf„, = (2^ _ 2") — 
n ■ n 



We set 



x,-™ = -ln(2^-2'"), 



and 



iJ(j,m) = (2^- - 2'»)i/2y2^^, 



pcn(m) = V2^2"/2_^. 

The choices of H{j, m) and of pen(m) given above correspond to the control 
of the Li risk in CoroUary 1 {p ~ 1). when we use the Haar basis. For these 
choices, 

e "'cTm < 7^ 

and for ah m E A4, 

j>rn 

The order of magnitude of both series is smaller than the rates of convergence 
of E(|s(a;) — s{x)\) obtained in Corollary 1. 

Our procedure is called PI. We compare the performances of our procedure 
to the performances of the estimator s studied by Baraud [1] and defined as 
follows: let, for all functions t 

in{t) = -Y.{y--^{- 

s = argmin,„g^(7„(s„) + pen'(m)), 

with pen'(m) = 2.2™(T^/n, which corresponds to a Mallow's Cp criterion. This 
procedure is called P2. 

We also compare our procedure with a wavelet thresholding procedure, for 
which the wavelet coefficients which are smaller than ay/2\n{n) are set to 0. 
This procedure is called P3. 

In Figure 1, we have represented the functions si, 82,83 and one simulated 
sample for the noised observations {i/n,yi). 

We estimate the pointwise risk in absolute value E(|s(.t) — s{x)\) for the es- 
timation of 8j{x) with the procedures PI, P2 and P3. The estimation of the 
risk is based on = 5000 simulations and is defined as 



1=1 
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Fig 1. Functions si,S2,S3, and one simulated sample n = 2*, cr = 0.2. 



where s = Si, S2 or S3 and s^'^ is the estimator of s based on the /-th simulated 
sample. In the following tables, we give the values of 100 * fi at points 1/4, 1/3, 
1/2 and 3/4 for si, 1/8, 1/4, 1/3 and 1/2 for S2 and 1/4, 1/3, 1/2 and 7/8 for S3. 



Sl 


ii = 1/4 


i2 = 1/3 


£3 = 1/2 


Xi = 3/4 




PI 


P2 


P3 


PI 


P2 


P3 


PI 


P2 


P3 


PI 


P2 


P3 


H 


5.6 


3.2 


14.9 


4.5 


4.0 


7.4 


4.2 


6.9 


11.3 


5.7 


8.0 


16.9 


D20 


5.4 


5.4 


5.8 


2.6 


3.0 


3.0 


6.5 


6.5 


2.0 


6.6 


6.6 


7.5 




S2 


xi = 1/8 


i2 = 1/4 


ia = 1/3 


Xi = 1/2 




PI 


P2 


P3 


PI 


P2 


P3 


PI 


P2 


P3 


PI 


P2 


P3 


H 


3.8 


6.3 


3.2 


23.3 


27.8 


30.4 


4.7 


6.3 


4.8 


3.5 


6.1 


3.0 


D20 


6.8 


5.1 


5.6 


20.8 


23.3 


35.8 


6.1 


6.5 


9.7 


6.8 


5.0 


6.7 



S3 


= 1/4 


i2 = 1/3 


is = 1/2 


Xi = 7/8 




PI 


P2 


P3 


PI 


P2 


P3 


PI 


P2 


P3 


PI 


P2 


P3 


H 


5.9 


7.9 


5.9 


5.2 


8.0 


5.0 


8.0 


7.9 


9.9 


7.5 


8.2 


8.1 


D20 


3.4 


5.2 


4.9 


4.9 


6.3 


6.1 


3.6 


5.1 


6.3 


5.2 


5.2 


7.1 



For our procedure, at each point xi, an estimator is selected among a collection 
of dn estimators. This collection is composed of the estimators based on a pro- 
jection onto a wavelet basis up to the level j for j = 1, . . . , d„. We represent in 
Figure 2 the histograms of the selected levels for the estimation of S2 (x) with 
the Haar basis, for a point where the function S2 is nearly flat {x^ = 1/2) and 
at a peak of the function S2 (£2 = 1 /4) . We also represent the histogram of the 
selected levels for the estimation of S2 with the procedure P2 when we use the 
Haar basis. We recall that for this procedure a level is selected for the estimation 
of the whole function. 

Figure 2 clearly shows that, as expected, the level which is selected by our 
procedure is higher at points where the function to be estimated is "irregular" . 
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Fig 2. Histogram of the selected levels for the procedure PI and S2 at X4 = 1/2 (on the left) 
and X2 = 1/4 (on the middle), selected levels for the procedure P2 (on the right). 



The procedure P2 selects a high level to estimate accurately the function S2 
near the peaks. 

The simulation results show that, in most cases, we obtain good results with 
the pointwise adaptive procedure PI for the risk ri{x). Our procedure performs 
better at points where the function is "irregular" . Except for the fonction si 
with the Haar basis, the procedure PI performs in most eases better that P2. 
Whatever the basis and the function, our procedure performs better in most 
cases than the procedure P3, or as well as P3. 

4.2. Estimation of integral functionals 

One observes (j/i, 1 < i < n) given by (4.8). We consider the problem of es- 
timating Sj{x)g{x)dx. In the first part of the study, g — l^^ h^/H, where 
H = 1/4, 1/32, 1/128. The last value of H is comparable to l/ri. This problem 
has been considered in Section 3.2. 

In the second part of the study, g equals gi or g2 defined on [0, 1] by 

gi{x) = cos(647ra;) 
.92(2^) = cos(47ra:). 

For all (j, to) €E Ai'^, we choose the same values for Xm and xj^m as in Section 
4.1. Denoting, for all to S A^, by 7rs„ the orthogonal projection onto Sm one 
has 

f^m = lks,„(.9)lP VtogTW, 

c^lm = hsAaW - hs„Mf Vj > to e m. 

We compare the estimator obtained with our procedure PI with the estima- 
tors /g gs where s is obtained by Procedure P2 or P3. Those two procedures 
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arc still denoted by P2 and P3. We also compute the empirical estimator 
X]r=i Vidi.^i)/''^- This procedure is denoted by P4. Throughout the section, we 
use the Haar basis for the simulations. In the following tables, we give the value 
of 100 *fi, 

1=1 

with N = 5000 and T(s)(') the estimation of T(s) based on the Z-tli simulated 
sample. 





H = 1/4 


H = 1/32 




PI 


P2 


P3 


P4 


PI 


P2 


P3 


P4 


Sl 


2.1 


2.0 


3.5 


2.0 


4.6 


4.2 


9.3 


5.7 


32 


1.7 


2.1 


1.4 


2.1 


3.3 


5.7 


2.6 


5.7 


S3 


2.9 


2.0 


2.0 


2.0 


4.4 


5.7 


4.4 


5.7 







H = 


1/128 






PI 


P2 


P3 


P4 


Sl 


4.7 


4.1 


9.4 


11.2 


S2 


3.4 


6.2 


2.6 


11.3 


S3 


5.3 


8.1 


5.4 


11.2 



In most cases, our procedure is comparable with the best procedure. Whatever 
the value of H, the procedures P2 and P3 use the same estimator for the 
function s. The risk for P4 increases as H becomes smaller since this procedure 
considers the mean of the observations over a smaller sample. Our procedure 
PI takes advantage of the regularity of the signal in a neighbourhood of the 
interval [0,H] to consider the mean over a larger sample. 

In the following tables, we present the results for the estimation of the linear 
functionals Sj{x)gi{x)dx, i — 1,2. 





91 




PI 


P2 


P3 


P4 


Sl 


2.09* 10-^ 


2.24* 10-^ 


2.71* 10-^ 


0.7 


S2 


0.29 


0.30 


0.28 


0.72 


S3 


0.36 


0.30 


0.31 


0.74 




52 




PI 


P2 


P3 


P4 


Sl 


2.86 


2.86 


2.86 


2.88 


S2 


0.77 


0.72 


1.00 


0.72 


S3 


0.71 


0.71 


0.56 


0.72 



Our procedure is comparable to P2 and in most cases our risk has the same 
order as that of the best procedure. 
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5. Proofs 

5.1. Proof of Theorem 1 

We shall use the following lemma. 
Lemma 2 For all m G A4, for all x > 0, 




j>m,jeA4 



5.1.1. Proof of Lemma 2 



We recall that, for all m G A^, 



Since T is a linear functional, 



T{Sm) 



E Y{4>x)T{4>x). 




Let X - Mifi, v'^). For all a; > 0, 




which implies that 




(5.9) 



Since for all a, 6 > 0, Va + b < + Vb, we obtain that for all {j,m) e 
such that j > m, for all x > 0, 
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Finally, 

P (^Crit(m) > Crit(m) + \/2^^ 

< P (3j > m,j e M, \T{s,) - T{s„,)\ - H{j, m) > \T{sj) - T{s„,)\ + V2^) 

< ^{\Tisj)~Tis^)\~Hij,m)>\Tis,)~Tis^)\+V2iy 

j>m,j£M 

This concludes the proof of Lemma 2. 

• We first consider the case where rh < m*. Since for all m E M, pen(m) > 
and by definition of Crit(m), we have 

Crit(m) > \T{s^)-T{sm')\-Hiw*,m) 

> \T{s,n) - T{s)\ - \T{srn') - T{s)\ - H{m*,m). 

Since 

CrTtfm) < CrTtfm*) + - 
n 

we obtain 

1 



\T{s,n) - T{s)\ < Crit(m*) + H{m*,m) + |r(s,„.) - T{s)\ + 



n 



On the event {m < m*}, 

H{m*,rh) < sup H{m*,j). 

j<m* 

Hence, using Lemma 2, we obtain that for all a; > 0, the probability of 

\T{s^)-T{s)\ > Crit(m*) + y2;;+ sup Him* , j) + \T{s,„,) - T{s)\ + - 



n {m < m*} is bounded by 



^ e-^-^-e-"/"?-*. (5.10) 



We now consider the case where rh > to*. We recall that 

^2 



a,^ = var(T(s„)) = - ^ r2(</,;,) 



71 

AeA„ 



Using inequality (5.9), we obtain that for all m E A4, 

P (|r(J„0 - T{s)\ > |T(s„0 - r(s)| + V2i + arnV2^) 



/ 2 
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This implies that 

P (|r(s™) - T(s)| > |T(sa) - T{s)\ + V2i + pcn(?n)) 

We notice that for aU m G M., 

Crit(TO) > pen(TO) 

since 

sup [|T(s„) - r(s,)| - H{],m)] > [\T{s,n) - T{s,n)\ - i/(m,m)] 

j>m 

and the right hand side of the inequahty is equal to 0. 
Using the inequalities 

pen(TO) < Crrt(TO) < Crrt(TO*) + -, 

n 



we obtain 



|r(sA) - T{s)\ > \T{s,n) - T(s)| + + Crit(m*) + - 

n 



meM 

We use the following inequality that holds if m > m* : 

|T(srfO-T(s)| < sup \T{s,)~T{s)\ 

and we apply Lemma 2 with m = m* to control Crit(m*) by Crit(m*). Hence 
the probability of 

|r(sA) - T{s)\ > sup \T{sj) - T{s)\ + 2V2^ + Crit(TO*) + - 

j>m* ^ 

n {m > m*} is bounded by 

^e-""e^"/""+ e-"^-"*e-"/"?.'"* . (5.11) 

Define 

C™. = Crit(m*) + sup i^Cm*, j) + sup \T{s,) - T{s)\ + - 

and 

X = \T{sfn) - T{s)\ , Y - |T(s,„.) - T(s)| . 



B. Laurent, C. Ludena, C. Prieur /Adaptive estimation of linear Junctionals 1012 
It follows from (5.10) and (5.11) that for all a; > 0, 
P - y > C„,. + 2\/2^) < E e-^™e~^+2 ^ e-'^^-'e "1^.(5.12) 

We have 

E{XP) = E(Xf Ix>y+c„. ) + E{Xnx<Y+c^, ) 

<E[iX-Y- a„- + 5^ + anO^Ix^^y+C™.] + E [(F + C„.')nx<Y+c^,] 

< 2P-^E [(x-Y- a„.)nx>Y+c,„,] + 2"-^^ [{Y + c,n-^r] . 

We have used the inequahty (a + b)P < 2P-^{aP + If) which holds for aU 
p > 1, a,6 > 0. 

Moreover, setting (u)^ — (max(w, 0))^, 

/•OO 

E[(X-Y- C,„0+] = / P{{X-Y~ C,«.)+ > t) dt 

JQ 

p ((X - y - c„.)+ > {2V2^r) p2P'\V2nv^r-^dx. 

Hence, using (5.12), we get 

E((x-y-c,„.)+) 



c{p) ( E ^-^-<+ E I r 

\meM j>m' j 



We conclude that for all p > 1, there exists some constant C(p) > such that 
E[|T(sA)-r(s)n < C(p){Crit(m*)''+E(|T(s™0-ns)r)} 



+ C(p) sup i/f(m*,j)+ sup |r(.sj)-r(s)p 

+ ^(p) ( E + E + ^] 

j>m* I 



Moreover, possibly enlarging C(j>), 

E [|T(,s„o - r(,s)n < c(p) [|r(.„.) - T(.)r + a^] . 

Since 

\T{s,) - r(,s)| < |T(,s,) - T(,s,„.)l + |T(s™.) - r(,s)|, 

we obtain 

sup |r(s,) - T{s)\^ < C{p) f |T(s™0 - T{s)\P + sup |r(s,) - T(s,„.)r) 

< C{p) (|r(s„.) - T{s)\P + Crit(m*)f ) . 
This concludes the proof of Theorem 1 . 
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5.2. Proof of Corollary 1 

The proof follows from bounding the terms on the right hand side of (2.4). In 
the following, C denotes a positive constant which may vary from line to line. 
We mention the dependency of these constants with respect to the parameters 
involved in the problem. 

Let Wj be the orthogonal complement of Sj in Sj+i : Sj+i ~ Wj ^Sj. Set 
Dj{s) the projection of s onto Wj. 

It follows from [24, Theorem 3] p. 31 that there exists a constant C(r) such 
that 

\\iD,{s)Y^^^<C{r)2^^\\D,is)\\^. 

We recall that 

where ipj^k = 2^^^ip{2^x — k) and Pj.k = {sipj,k)- 

Hence, since we have assumed that ^ has a compact support. 

||i^,(s)|U <C(r)2^'/2sup|/3,-fc| 
fcez 

and 

fcez 

Since s e i3Q,oo,g([0, 1]) with ||.s||Q,oo,g < L, 

2«("+i/2) sup 1/3^- < L". (5.13) 



j>0 ''^^ 



We define B{m) = sup,„<j-<^^ \sf'{xo) - s'm\xo)\- 
B{m) < sup||sf-sW|U 

j>m 

< sup ^ |l(A(.))('-)|loo 

< Cir) V 2^("+i/2) sup \pj,k\2~'^"~'-\ 

~^ fcez 

Using Cauchy-Schwarz inequality, 

i?(™)<C(r)(^|]2«("+V2) (sup|/3,,,|y^ ^'(E2-^''^"~^^) 

where 1/q+l/q' = 1. It follows from (5.13) that 

B{m) < C(a,r,g)i2-™("-'-). 
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Note that 



,2 



l—m k^lj 



Vj > m, = Var(T(,s,) - T(,s„0) = ^ E E W^H^o) 
It follows from conditions (i) and (iv) that 



and that 



Thus, for all m in A^, 



Crit^'(m) < C(p, a, r, g) lP2-p'"("-'-) + f J 



Let 



mo(n) 



1 



■In 



ln(2) \\a^\n{n) 



2 \ l+2o 



(5.14) 



where [x] denotes the integer part of x. 

One can easily show that if n/(a'^\n{n)) > 2^+^"L~^ and n^"cr^ln(ri) > i^, 
then 

mo(n) e M . 

Hence, 

Grit (to*) < Crit(TO.o(n)) + - , 

n 

which implies that 



p(a-r) 













CritP(TO*) < C(p,a,r,g) 

We next bound the other terms on the right hand side of Theorem 1 . 
• For all 1 < m < dn, 

|.M(^o) - s<^^\xo)f < {\s'-l!ixo) - s(^Hxo)\ + |.W(xo) - 4?(xo)|)' 

< C{a,r,q) ('lP2-^"^" + sup \si:\xo)) - sf\xo)\A . 
This implies that 

l-s^^lixo) - 5W(xo)r < C(a,r,g) (lJ'2-p'^"("-'^) + Crit^ (to*)) . 
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• For all 1 < m < (i„, Om < pen(m) < Crit(m). 

• For all 1 < m < dn, 

sup H{m,i) = sup ^Xra,]<Jm,j < \fx^<Jm < Crit(m) . 

This implies that 

CritP(77i*) + |4^!(xo)-sM(xo)r + af„. + sup H^{],m*) 

j<m' 



< C{p, a, r,q) \ L i+^o 



(T^ In? 



nP 



Since 2''^" < 2/n and n^^cr^ > L^, 



On the other hand we have 



7,P/2 



which yields the desired bound. 



5.3. Proof of Corollary 2 

As in Corollary 1, the proof follows by bounding the terms on the right hand 
side of (2.4). Let us first control for all m & M.. For m < m„, 



fcez " fcez 

Since Lp has a compact support, 



x~k) I ipiT'x-k). 



E 



ipiOTx - k) 



<C\\ip\\ooHn, 



and 



sup 



(p(2'"a; - k) 



Ih„ 



<qi(pll^(2-'"AF„ 



Hence, for all m < m„, 



< < C-2" 
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Moreover, 

n 

It is easy to see that cr|„j < 2((t^ + a^) and that a„i^m — 0. Hence 



This impUes that 



j>m 

Since T{sra„+i) = T{s), and since for aU f,g e L2([0,1]), \T{f) - T{g)\ < 
11/ ~ 5II00, we obtain, by the same computations as in the proof of CoroUary 1, 
that for all m < nin 

sup |r(s„)-r(s,)| < sup ||s™-s,||oo + ||sru-s|U < C(a,(7)i.2-™". 

Let nio{n) be defined by (5.14). Since we assumed that ni^/cr^ ln(n) > 1, 
mo(n) e N. If (nLVCT2i^(„)-)i/(i+2a) ^ 2™o(n) < and we get 

CritP(m*) < C7(p)(CritP(mo(n)) + n-P) 



Moreover, 

Crit^(m*) < C{p) {CritP (nin + 1) + n^^ ) 



Also, since T{sm^+i) = ^"(s), 

|T(s™.) - T{s)\ < sup |r(s„.) - T{s,)\ < Crit(m*). 

j>m' JEM 

The other terms appearing in the upper bound for E(\T(srh) — T{s)\p) given in 
Theorem 1 can be controled as in the proof of Corollary 1. 
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5.4- Proof of Lemma 1 

It follows from the results of Donoho and Liu [1 '■\] that 

inf sup E((r„-T(s))2) >C(aV|f^,r,7i„(L)Y 

where uj2{t,T^!F) denotes the modulus of continuity of the linear functional T 
over the set T with respect to the norm, namely 

i02{e, T,T) - sup{|r(/i) - T(/o)|, /o, /i G \\h - kh < e} . 

In order to prove (3.6), we consider the functions fo — and /i defined on [0, 1] 
by: 

fi{x) = Pnx" for X e [0,H„], 

fi{x) = p„(2if„-x)° forx-e (iJ„,2F„], 

fi{x) = for X > 2Hn 

where 

p„ = ((l + 2«)/(2niJi+2"))V2. 

The condition > ((1 + 2a)/(2nL2)) ^ ensures that pn < L, hence 
/i e HaiL). One can easily verify that ||/i — /0II2 = l/v^ ^i^d that = 
C(a)/v^. 

In order to prove (3.7), we consider the functions /o = and /i defined on 
[0, 1] by: 

/i(a;) = L(7„ - x)" for x £ [0,7„], 
/i(a;) = for X > 7„ 

where 7„ = ((l + 2Q;)/(ni^))-'^/'^+^"'. Since we have assumed that nL^ > 1 + 2q;, 
7„ < 1. We have, ||/i - /0II2 = and 



r(/i) - ^" 



1-1 



i?„(l + a) 

r(/i) - L7"c;? 

where C„ € [1 — -ff„/7n, 1]- The last equality is obtained by Taylor-Lagrange's 
formula. For i7„ < i(l + 2a)i/(i+2") (y^L^) "1/(1+2")^ H^/jn < 1/2, which leads 
to T(/i) > C(a)L7^, hence (3.7) holds. 

5.5. Proof of Corollary 3 

For aU m e M, we set A™ = {0, 1, . . . , 2™"i}'*, and for all A = (fci, ...,kd) G 
A„, let /a = [It, ^[x • • • X [|^, ^[ and 
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Then {(px, A e h-ra) is an orthonormal basis of Sm- It is easy to verify that for 
aU m e M, 



and that for aU (j, m) e A^^, 



It follows from the definition of Xm and Xj,,„ that 



Moreover, 



and 



am' + sup ^Xm. jCTm. J < Cpcn(m*) 

j<m' 



\Sm*{xo) - s{xo)f < C{p) sup \Sj{xq) - Sm'{xQ)\^ 

j'>7n* 

+ C(p)|sd„(xo)-s(:ro)r- 
Hence, it follows from Theorem 1 that, possibly enlarging C(e), 

E(|s^.(xo) - sixo)n < C{p) (Cvit^im*) + ^ + \sdAxo) - s{xoW 



For all m e A^, 



This implies that, 



1 



s,„(a;o) - s(a::o) = — r / (s(a;) - s{xo))dx, 



where I\{xo) is the set Ix that contains xq. Hence, if s e Tia{L), 

|s,„(a;o)-s(a;o)| <L2-™". 

Let 

' ln(2) UM") 
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Since mi(n) G M, as soon as n/ln(n) > 2''+^"L ^ and < n^S ln(n), 

CritP{m*) < CritP{mi{n)) < C{p,a,d,a)L^ 
Hence, for n large enough, 

E {\s^{xo) - s{xo)f) < C{p, a, d, a)L^ 
This concludes the proof of Corollary 3. 

Acknowledgment 

We would like to thank the Associate Editor, the referees and A. Tsybakov for 
their helpful comments. 

References 

[1] Baraud, Y. (2000). Model selection for regression on a fixed design, 

Probab. Theory Related Fields, 117, 467-493. MR1777129 
[2] Baraud, Y. (2002). Model selection for regression on a random design, 

ESAIM Probab. Statist., 6, 127-146 MR1918295 
[3] Barron, A.R., Birge, L., Massart, P. (1999). Risk bounds for model 

selection via penalization, Probab. Theory Related Fields, 113, 301-415. 

MR1679028 

[4] Birge, L. (2001). An alternative point of view on Lepski's method. State of 
the art in probability and statistics (Leiden, 1999), 113-133, IMS Lecture 
Notes Monogr. Ser., 36, Inst. Math. Statist., Beachwood, OH. MR1836557 

[5] Birge, L., Massart, P. (2001). Gaussian model selection. J. Eur. Math. 
Soc. (JEMS) 3, no. 3, 203-268. MR1848946 

[6] Birge, L., Massart, P. (1997). From model selection to adaptive estima- 
tion. Festschrift for Lucicn Le Cam, 55-87, Springer, New York. MR1462939 

[7] Brown, L., Low, M. (1996). A constrained risk inequality with appli- 
cations to nonparametric functional estimation. Ann. Statist. 24, no. 6, 
2524-2535. MR1425965 

[8] Cai, T., Low, M. (2003). A note on nonparametric estimation of linear 
functional. Ann. Statist. 31, no. 4, 1140-1153. MR2001645 

[9] Cai, T., Low, M. (2004). Minimax estimation of linear functionals over 
nonconvex parameter spaces. Ann. Statist. 32, no. 2, 552-576. MR2060169 
[10] Cai, T., Low, M. (2005a). Adaptive estimation of linear functionals under 
different performance measures. Bernoulli 11, no. 2, 341-358. MR2132730 
[11] Cai, T., Low, M. (2005b). On adaptive estimation of linear functionals. 

Ann. Statist. 33, no. 5, 2311-2343. MR2211088 
[12] DONOHO, D. (1994). Statistical estimation and optimal recovery. Ann. 
Statist. 22 (1994), no. 1, 238-270. MR1272082 





B. Laurent, C. Ludena, C. Prieur /Adaptive estimation of linear Junctionals 1020 



[13] DONOHO, D., Liu, R.(1991). Geometrizing rates of convergence. II, III. 
Ann. Statist. 19, no. 2, 633-667, 668-701. MR1105839 

[14] GOLUBEV, Y., Levit, B. (2004). An Oracle Approach to Adaptive Esti- 
mation of Linear Functionals in a Gaussian Model, Mathematical Methods 
of Statistics, 13, no. 4, 392-408. MR2126747 

[15] Hardle, W., Kerkyacharian, G., Picard, D.,Tsybakov, a. (1998). 
Wavelets, Approximation, and Statistical Applications, Lecture Notes in 
Statistics, 129, Springer. MR1618204 

[16] Ibragimov, la., Khas'minskii, R.Z. (1984). On Nonparametric Estima- 
tion of the Value of a Linear Functional in White Gaussian Noise, Teor. 
Veroyatn. Primen. 29, no. 1, 18-32. MR0739497 

[17] Klemela, J., TSYBAKOV, A. (2001). Sharp adaptive estimation of linear 
functionals, Ann. Statist. 29, no. 6, 1567-1600. MR1891739 

[18] Laurent, B., Massart, P. (2000). Adaptive estimation of a quadratic 
functional by model selection, Ann. Statist. 28, No 5, 1302-1338. 
MR1805785 

[19] Lepski, O. (1990). On a problem of adaptive estimation in gaussian white 

noise. Theory Probab. Appl. 35 454-466. MR1091202 
[20] Lepski, O. (1991). Asymptotically minimax adaptive estimation. I. Upper 

bounds. Optimally adaptive estimates. Theory Probab. Appl. 36, no. 4, 

682-697. MRl 147167 
[21] Lepski, O. (1992). Asymptotically minimax adaptive estimation. II. 

Schemes without optimal adaptation. Theory Probab. Appl. 37, no. 3, 433- 

448. MR1214353 

[22] Lepski, O., Mammen, E., Spokoiny, V. (1997). Optimal spatial adap- 
tation to inhomogeneous smoothness: an approach based on kernel esti- 
mates with variable bandwidth selectors. Ann. Statist. 25 , no. 3, 929-947. 
MR1447734 

[23] Lepski, O., Spokoiny, V. (1997). Optimal pointwise adaptive methods in 
nonparametric estimation. Ann. Statist. 25, no. 6, 2512-2546. MR1604408 
[24] Meyer, Y. (1990). Ondelettes, 1, Hermann Paris. MR1085487 



